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Abstract. We propose a new linear-size data structure which provides 
a fast access to all palindromic substrings of a string or a set of strings. 
This structure inherits some ideas from the construction of both the 
suffix trie and suffix tree. Using this structure, we present simple and 
efficient solutions for a number of problems involving palindromes. 


1 Introduction 

Palindromes are one of the most important repetitive structures in strings. 
During the last decades they were actively studied in formal language theory, 
combinatorics on words and stringology. Recall that a palindrome is any string 
S = 0102 • • • o„ equal to its reversal S = an - ■ ■ 0201 . 

There are a lot of papers concerning the palindromic structure of strings. 
The most important problems in this direction include the search and counting 
of palindromes in a string and the factorization of a string into palindromes. 
Manacher m came up with a linear-time algorithm which can be used to find 
all maximal palindromic substrings of a string, along with its palindromic pre¬ 
fixes and suffixes. The problem of counting and listing distinct palindromic sub¬ 
strings was solved offline in [5] and online in m- Knuth, Morris, and Pratt 
m gave a linear-time algorithm for checking whether a string is a product of 
even-length palindromes. Galil and Seiferas [1] asked for such an algorithm for 
the k-factorization problem: decide whether a given string can be factored into 
exactly k palindromes, where k is an arbitrary constant. They presented an on¬ 
line algorithm for k = 1,2 and an offline one for fc = 3,4. An online algorithm 
working in 0{kn) time for the length n string and any k was designed in |12| . 
Close to the fc-factorization problem is the problem of finding the palindromic 
length of a string, which is the minimal k in its fc-factorization. This problem 
was solved by Fici et al. in 0(n log n) time [3j. In this paper we present a new 
tree-like data structure, called eertre^ which allows one to simplify and speed 
up solutions to search, counting and factorization problems as well as to several 
other palindrome-related algorithmic problems. This structure can also cope with 
Watson-Crick palindromes [S] and other palindromes with involution and may 

^ This structure can be found, with the reference to the first author, in a few IT blogs 
under the name “palindromic tree”. See, e.g., http://adilet.org/blog/25-09-14/. 



be interesting for the RNA studies along with the affix trees [14] and affix arrays 

Ca¬ 
in Sect, [^we first recall the problem of counting distinct palindromic sub¬ 
strings in an online fashion. This was a motive example for inventing eertree. 
This data structure contains the digraph of all palindromic factors of an input 
string S and supports the operation add(c) which appends a new symbol to the 
end of S. Thus, the number of nodes in the digraph equals the number of dis¬ 
tinct palindromes inside S. Maintaining an eertree for a length n string with a 
distinct symbols requires 0(n log cr) time and 0(n) space (for a random string, 
the expected space is 0{y^^)). After introducing the eertree we discuss some 
of its properties and simple applications. 

In Section we study advanced questions related to eertrees. We consider 
joint eertree of several strings and name a few problems solved with its use. 
Then we design two “smooth” variations of the algorithm which builds eertree. 
These variations require at most logarithmic time for each call of add(c) and 
then allow one to support an eertree for a string with two operations: appending 
and deleting the last symbol. Using one of these variations, we design a fast 
backtracking algorithm enumerating all rich strings over a fixed alphabet up to 
a given length. (A string is rich if it contains the maximum possible number 
of distinct palindromes.) Finally, we show that eertree can be efficiently turned 
into a persistent data structure. 

The use of eertrees for factorization problems is described in Sect.|^ Namely, 
new fast algorithms are given for the fc-factorization of a string and for computing 
its palindromic length. We also conjecture that the palindromic length can be 
found in linear time and provide some argument supporting this conjecture. 

Definitions and Notation. We study finite strings, viewing them as arrays of 
symbols: w = w[l..n\. The notation a stands for the number of distinct symbols 
of the processed string. We write e for the empty string, |i(;| for the length of w, 

for the ith letter of w and w[i..j] for wjijwjz-l-l] ... w[j], where w[i..i—\] = e 
for any i. A string u is a substring of ic if m = w[i..j] for some i and j. A 
substring (resp., w[i..n\) is a prefix [resp. suffix] of w. If a substring 

(prefix, suffix) of rc is a palindrome, it is called a suhpalindrome (resp. prefix- 
palindrome, suffix-palindrome). A subpalindrome w[l..r] has center {l-\-r)/2, and 
radius \{r—l-\-l)/2]. Throughout the paper, we do not count e as a palindrome. 

Trie is a rooted tree with some nodes marked as terminal and all edges 
labeled by symbols such that no node has two outgoing edges with the same 
label. Each trie represents a finite set of strings, which label the paths from the 
root to the terminal nodes. 

2 Building An Eertree 

2.1 Motive problem: distinct subpalindromes online 

Well known online linear-time Manacher’s algorithm m outputs maximal ra¬ 
diuses of subpalindromes in a string for all possible centers, thus encoding all 


subpalindromes of a string. Another interesting problem is to find and count all 
distinct subpalindromes. Groult et al. [5] solved this problem offline in linear 
time and asked for an online solution. Such a solution in 0(n log tr) time and 
0(n) space was given in [TT], based on Manacher’s algorithm and Ukkonen’s 
suffix tree algorithm [3D]. As was proved in the same paper, this solution is 
asymptotically optimal in the comparison-based model. But in spite of a good 
asymptotics, this algorithm is based on two rather “heavy” data structures. In 
is natural to try finding a lightweight structure for solving the analyzed problem 
with the same asymptotics. Such a data structure, eertree, is described below. 
Its further analysis revealed that it is suitable for coping with many algorithmic 
problems involving palindromes. 

2.2 Eertree: structure, interface, construction 

The basic version of eertree supports a single operation add(c), which appends 
the symbol c to the processed string (from the right), updates the data structure 
respectively, and returns the number of new palindromes appeared in the string. 
According to the next lemma, add(c) returns 0 or 1. 

Lemma 1 ([^ ). Let S be a string and c be a symbol. The string Sc contains at 
most one subpalindrome which is not a substring of S. This new palindrome is 
the longest suffix-palindrome of Sc. 

From inside, eertree is a directed graph with some extra information. Its 
nodes, numbered with positive integers starting with I, are in one-to-one corre¬ 
spondence with subpalindromes of the processed string. Below we denote a node 
and the corresponding palindrome by the same letter. We write eertree(S') for 
the state of eertree after processing the string S letter by letter, left to right. 

Remark 1. To report the number of distinct subpalindromes of S, just return 
the maximum number of a node in eertree(5'). 

Each node v stores the length len[ti] of its palindrome. For the initialization 
purpose, two special nodes are added: with the number 0 and length 0 for the 
empty string, and with the number —1 and length — I for the “imaginary string”. 

The edges of the graph are defined as follows. If c is a symbol, v and cvc are 
two nodes, then an edge labeled by c goes from v to cvc. The edge labeled by c 
goes from the node 0 (resp. —I) to the node labeled by cc (resp., by c) if it exists. 
This explains why we need two initial nodes. The outgoing edges of a node v are 
stored in a dictionary which, given a symbol c, returns the edge to[z)][c] labeled 
by it. Such a dictionary is implemented as a binary balanced search tree. 

An unlabeled suffx link linkjw] goes from m to r; if u is the longest proper 
suffix-palindrome of u. By definition, link[c] = 0, link[0] = link[—I] = —1. The 
resulting graph, consisting of nodes, edges, and suffix links, is the eertree; see 
Fig.[T] for an example. 

Lemma 2. A node of positive length in an eertree has exactly one incoming 
edge. 
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Fig. 1. The eertree of the string eertree. Edges are black, suffix links are blue. 


Proof. An edge leading to a node u is labeled by c = m[ 1]. Then its origin must 
be the node v such that u = cvc or the node — 1 if tt = c. □ 

Proposition 1. The eertree of a string S of length n is of size 0(n). 

Proof. The eertree of S has at most n+2 nodes, including the special ones (by 
Lemma [^, at most n edges (by Lemma [^, and at most n+2 suffix links (one 
per node). □ 

Proposition 2. For a string S of length n, eertree(S') can he built online in 
0(n\oga) time. 

Proof. We start defining eertree(e) as the graph with two nodes (0 and —1) and 
two suffix links. Then we make the calls add(S'[l]),..., add(5'[n]) in this order. 
By Lemmaj^and the definition of add, after each call we know the longest sufhx- 
palindrome maxSuf (T) of the string T processed so far. We support the following 
invariant: after a call to add, all edges and suffix links between the existing nodes 
are defined. In this case, adding a new node u one must build exactly one edge 
(by Lemma and one suffix link: any suffix-palindrome of u is its prefix as well, 
and hence the destination node of the suffix link from u already exists. 

Consider the situation after i calls. We have to perform the next call, say 
add(a), to T = 5'[l..i]. We need to find the maximum sufHx-palindrome P of 
Ta. Clearly, P = a or P = aQa, where Q is a suffix-palindrome of T. Thus, to 
determine P we should find the longest suffix-palindrome of T preceded by a. 
To do this, we traverse the suffix-palindromes of T in the order of decreasing 
length, starting with maxSuf(T) and following suffix links. For each palindrome 
we read its length k and compare T[i—k] against a until we get an equality or 
arrive at the node —1. In the former case, the current palindrome is Q; we check 
whether it has an outgoing edge labeled by a. If yes, the edge leads to aQa = P, 
and P is not new; if no, we create the node P of length |Q| -|- 2 and the edge 












{Q,P). In the latter case, P = a; as above, we check the existence of P in the 
graph from the current node (which is now —1) and create the node if necessary, 
together with the edge (—1,P) and the sufhx link (P, 0). 

It remains to create the suffix link from P if |P| > 1. It leads to the second 
longest suffix-palindrome of To. This palindrome can be found similar to P: just 
continue traversing suffix-palindromes of T starting with the suffix link of Q. 

Now estimate the time complexity. During a call to add(a), one checks the 
existence of the edge from Q with the label a in the dictionary, spending 0(log a) 
time. The path from the old to the new value of maxSuf requires one transition 
by an edge (from Q to P) and fc > 0 of transitions by suffix links, and is 
accompanied by k+1 comparisons of symbols. In order to estimate k, follow the 
position of the first symbol of maxSuf: a transition by a suffix link moves it to 
the right, and a transition by an edge moves it one symbol to the left. During the 
whole process of construction of eertree(S'), this symbol moves to the right by 
< n symbols. Hence, the total number of transitions by suffix links is < 2n. The 
same argument works for the second longest suffix-palindrome, which was used 
to create suffix links. Thus, the total number of graph transitions and symbol 
comparisons is 0(n), and the time complexity is dominated by checking the 
existence of edges, 0{n log tr) time in total. □ 


2.3 Some properties of eertrees 

We call a node odd (resp., even) if it corresponds to an odd-length (resp., even- 
length) palindrome. By suffix path we mean a path consisting of suffix links. 

Lemma 3. 1) Nodes and edges of an eertree form two weakly connected compo¬ 
nents: the tree of odd (resp., of even) nodes rooted at —1 (resp., at Q). 

2) The tree of even (resp., odd) nodes is precisely the trie of right halves of even- 
length palindromes (resp., the trie of right halves, including the central symbol, 
of odd-length palindromes). 

3) Nodes and inverted suffix links of an eertree form a tree with a loop at its root 
- 1 . 

Proof. 1) If an edge {u, v) exists, then |u| = |u| -1-2. Hence, the edges of an eertree 
constitute no cycles, odd nodes are unreachable from even ones, and vice versa. 
Further, Lemmaimplies that each even (resp., odd) node can be reached by a 
unique path from 0 (resp., —1). So we have the two required trees. 

2) This is immediate from the definitions of a trie and an edge. 

3) The suffix link decreases the length of a node, except for the node —1. So 
the only cycle of suffix links is the loop at —1. Each node has a unique suffix 
link and is connected by a suffix path to the node —1. So the considered graph 
is a tree (with a loop on the root) by definition. 

Remark 2. Tries are convenient data structures, but a trie built from the set of 
all suffixes (or all factors) of a length n string is usually of size For a linear- 

space implementation, such a trie should be compressed into a more complicated 


and less handy structure: suffix tree or suffix automaton (DAWG). On the other 
hand, eertrees are linear-size tries and do not need any compression. Moreover, 
the size of an eertree is usually much smaller than n, because the expected 
number of distinct palindromes in a length n string is 0{y/na) [T^. This fact 
explains high efficiency of eertrees in solving different problems. 

Remark 3. A 9-palindrome is a string S' = ai • • • a„ equal to 9{an ■ ■ ■ oi), where 
0 is a symbol-to-symbol function and 9'^ is the identity (see, e.g., 0 ). Clearly, an 
eertree containing all 0-palindromes of a string can be built in the way described 
in Proposition (the comparisons of symbols should take 9 into account). 


2.4 First applications 

We demonstrate the performance of eertrees on two test problems taken from 
student programming contests. The first problem is Palindromic Refrain [5T1 Prob¬ 
lem A], stated as follows: for a given string S find a subpalindrome P maximizing 
the value |P| • occ{S,P), where occ{S,P) is the number of occurrences of P in 
S. The solution to this problem, suggested by the jury of the contest, included 
a suffix data structure and Manacher’s algorithm. 

Proposition 3. Palindromic Refrain can be solved by an eertree with the use of 
0(n) additional time and space. 

Proof. In order to find occ[z;] for each node of eertree(S'), we store an auxiliary 
parameter occAsMaxjr;], which is the number of i’s such that maxSuf(S'[l..i]) = v. 
This parameter is easy to compute online: after a call to add, we increment 
occAsMax for the current maxSuf. After building eertree(S'), we compute the 
values of occ as follows: 

occ[u] = occAsMaxju] -I- occ[m] . (1) 

u:link[u]=i? 


Indeed, if is a suffix of iS'[l..i] for some i, then either v = maxSuf (S')!..*]) and 
this occurrence is counted in occAsMaxjr;], or n = maxSuf(M) for some suffix- 
palindrome u of S'[l..i]; in the latter case, linkjit] = v, and this occurrence of v is 
counted in occ[it]. To compute the values of occ in the order prescribed by Q, 
one can traverse the tree of suffix links bottom-up: 

for (v = size; v > 1; v—) 
occ[v] = occAsMaxfv] 
for (v = size; v > 1; v—) 

occ[ link[v] ] += occ[v] 


Here size is the maximum number of a node in eertree(S'). Note that the node 
link)?;] always has the number less than v, because link)?;] exists at the moment of 
creation of v. After computing occ for all nodes, P = argmax(occ[n] • len[w]). □ 





The second problem is Palindromic Pairs [22l Problem B]: for a string S, find 
the number of triples i,j, k such that l<i<j<k<\S\ and the strings S[i..j], 
are palindromes. 

Proposition 4. Palindromic Pairs can be solved by an eertree with the use of 
0(n log ct) additional time and 0{n) space. 

Proof. Let maxSuf[j] = maxSuf(S'[l..jj) and sufCount[n] be the number of 
suffix-palindromes of the subpalindrome v of S, including v itself. Note that 
sufCount[u] = 1 -f sufCount[link[n]]. Hence, sufCount[u] can be stored in the node 
V of eertree(S') and computed when this node is created. In addition, we mem¬ 
orize the values maxSuf[l],..., maxSuf[n] in a separate array. The number of 
palindromes ending in position j of S is the number of suffix-palindromes of 
S'[l..j] or of maxSuf(5'[l..j]). So this number equals sufCount[maxSuf[j]]. 

Further, let prefCount[u] be the number of prefix-palindromes of v and 
maxPref[j] be the longest prefix-palindrome of S[j..n]. The values of prefCount 
and maxPref can be found when building eertree(5'|^ Similar to the above, the 
number of palindromes beginning in position j of S is prefCount[maxPref[j]]. Note 
that all additional computations take 0(1) time for each call of add, except for 
the second eertree, which requires 0(n log cr) time. 

For a fixed j, the number of triples {i,j, k) defining a palindromic pair is the 
number of palindromes ending at position i times the number of palindromes 
beginning at position jTl. Hence, the answer to the problem is 

n—1 

sufCount[maxSuf[j]] • prefCount[maxPref[j-|-l]]. 

i=i 

Since this is also a linear-time computation, we are done with the proof. □ 


3 Advanced Modifications of Eertrees 

3.1 Joint eertree for several strings 

When a problem assumes the comparison of two or more strings, it may be useful 
to build a joint data structure. For example, a variety of problems can be solved 
by joint (“generalized”) suffix trees, see [7]. Here we introduce the joint eertree 
of a set of strings and name several problems it can solve. 

A joint eertree eertree(S'i,..., Sk) is built as follows. We build eertree(S'i) in a 
usual fashion; then reset the value of maxSuf to 0 and proceed with the string S' 2 , 
addressing the add calls to the currently built graph; and so on, until all strings 
are processed. Each created node stores an additional /c-element boolean array 
flag. After each call to add, we update flag for the current maxSuf node, setting 

^ The strings S and S have exactly the same snbpalindromes, so there is no need to 
build the second eertree. We just perform calls to add on eertree)^) and fill prefCount 
and maxPref. 



its ith bit to 1, where Si is the string being processed. As a result, flag[r!][i] 
equals 1 if and only if v is contained in Si. 

Some problems easily solved by a joint eertree are gathered below. 


Problem 

Solution 

Find the number of subpalindromes, 
common to all k given strings. 

Build eertree(S'i, ..., Sfe) and count the 
nodes having only I’s in the flag array. 

Find the longest subpalindrome con¬ 
tained in all k given strings. 

Build eertree(S'i,..., S'fc). Among the 
nodes having only I’s in the flag array, 
find the node of biggest length. 

For strings S and T find the number 
of palindromes P having more occur¬ 
rences in S than in T. 

Build eertree(S', T), computing occs and 
occt in its nodes (see Palindromic Refrain 
in Sect. 2.4). Return the number of nodes 
V such that occsfu] > occ-rkl. 

For strings S and T find the num¬ 
ber of equal palindromes, i.e., of 
triples (i,j,k) such that S[i..i-\-k] = 
T\j..j-\-k] is a palindrome. 

Build eertree(S', T), computing the values 
occg and occ-r in its nodes. The answer 
is Y,v occs[?;] • occtH- 


3.2 Coping with deletions 

In the proof of Proposition]^ an 0(nlogcr) algorithm for building an eertree is 
given. Nevertheless, in some cases one call of add requires I2(n) time, and this 
kills some possible applications. For example, we may want to support an eertree 
for a string which can be changed in two ways: by appending a symbol on the 
right (add(c)) and by deleting the last symbol (pop()). Consider the following 
sequence of calls: 

add(a),..., add(a), add(6), pop(), add(6), pop(),..., add(&), pop() 

'-V-''-V-" 

n/3 times n/3 times 

Since each appending of b requires n/3 suffix link transitions, the algorithm 
from Proposition will process this sequence in time independent of the 

implementation of the operation pop(). 

Below we describe two algorithms which build eertrees in a way that provides 
an efficient solution to the problem with deletions. 

Searching suffix-palindromes with quick links. Consider a pair of nodes v, link[u] 
in an eertree and the symbol b = — |link[ri]|] preceding the suffix link[u] in 

V. In addition to the suffix link, we define the quick link: let quickLink[u] be the 
longest sufHx-palindrome of v preceded in u by a symbol different from b. 

Lemma 4. As a node v is created, the link quickLink[n] can be computed in 0(1) 
time. 

Proof. The two longest suffix-palindromes of v are u = link[ti] and u' = 
link[link[z;]]. Assume that v has suffixes bu and cu'. licf^b, then quickLink)^;] = u' 











by definition. If c = 5, then clearly quickLink[t!] = quickLink[it]. Thus we need a 
constant number of operations. The code computing the quick link of v is given 
below. □ 


if ( S[n - len[link[v]]] == S[n - len[link[link[v]]]] ) 
quickLinkfv] = quickLink[link[v]] 

else 

quickLinkfv] = link[link[v]] 


Recall that appending a letter c to a current string S', we scan suffix- 
palindromes of S to find the longest suffix-palindrome Q preceded by c; then 
maxSuf(Sc) = cQc. (If cQc is a new palindrome, then this scan continues until 
link[c(5c] is found.) The use of quick links reduces the number of scanned suffix- 
palindromes as follows. When the current palindrome is v, we check both v and 
link[u]. If both are not preceded by c, then all suffix-palindromes of S longer than 
quickLink)?;] are not preceded by c either; so we skip them and check quickLink[u] 
next. 

Example 1. Let us call add(5) to the eertree of the string S = aabaabaaba. 
The longest suffix-palindrome of S is the string v = abaabaaba. Since the sym¬ 
bols preceding v and link)?;] = abaaba in S are distinct from b, we jump to 
quickLink)?;] = a, skipping the suffix-palindrome aba preceded by the same letter 
as link[n]. Now quickLink]?;] is preceded by b, so we find maxSuf(5'6) = bab. Note 
that V “does not know” which symbol precedes its particular occurrence, and 
different occurrences can be preceded by different symbols. So there is no way 
to avoid checking the symbol preceding link)?;]. 

Constructing an eertree with quick finks, on each step we add 0(1) time and 
space for maintaining these links and possibly reduce the number of processed 
suffix-palindromes. So the overall time and space bounds from Proposition)^ are 
in effect. Let us estimate the number of operations per step. The statements 
on “series” of palindromes, analogous to the next proposition, were proved in 
several papers (see, e.g., m Lemmas 5,6] and |S1 Lemma 5]). 

Proposition 5. In an eertree, a path consisting of quick links has length 
0(log n). 

Corollary 1. The algorithm constructing an eertree using quick links spends 
O(logn) time and 0(1) space for any call to add. 

Using direct links. Now we describe the fastest algorithm for constructing an 
eertree which, however, uses more than 0(1) space for creating a node. Still, the 
space requirements are quite modest, so the algorithm is highly competitive: 

Proposition 6. There is an algorithm which constructs an eertree spending 
O(logcr) time and 0(min(logcr, loglogn)) space for any call to add. 




Proof. For each node we create a direct links: directLink[ti] [c] is the longest suffix- 
palindrome of V preceded in v by c. 

Let Q be the longest suffix-palindrome of a string S, preceded by c in S. Then 
either Q = maxSuf(5') or Q = directLink[maxSuf(S')][c], and the longest suffix- 
palindrome of Q, preceded by c, is directLink[(3][c]. Thus, we scan suffixes in 
constant time, and the time per step is now dominated by O(logcr) for searching 
an edge in the dictionary plus the time for creating direct links for a new node. 

Note that the arrays directLink[u] and directLink[link[u]] coincide for all sym¬ 
bols except for the symbol c preceding link[ri] in v. Hence, creating a node 
V we first find link[ti], then copy directLink[link[?;]] to directLink[ri] and assign 
directLink[u][c] = link[r;]. However, storing or copying direct links explicitly would 
cost a lot of space and time. So we do this implicitly, using fully persistent bal¬ 
anced binary search tree {persistent tree for short; see[T]). We will not fall into 
details of the internal of the persistent tree, taking it as a blackbox. The persis¬ 
tent tree provides full access to any of m its versions., which are balanced binary 
search trees. The versions are ordered by the time of their creation. An update 
of any version results in creating a new (m-l-l)th version, which is also fully 
accessible; the updated version remains unchanged. Such an update as adding 
a node or changing the information in a node takes 0(log/c) time and space, 
where k is the size of the updated version. 

We store direct links from all nodes of the eertree in a single persistent tree. 
Each version corresponds to a node. Direct links directLink[ri][c] in a version v 
are stored as a search tree, with the letter c serving as the key for sorting (we 
assume an ordered alphabet). Creation of a node v requires an update of the 
version corresponding to the node link[z;]. It remains to estimate the size of a 
single search tree. It is at most a by definition, and it is 0(log n) by Proposition]^ 
Thus, the update time and space is 0(min(logCT, loglogn)), as required. □ 


Comparing different implementations. The three methods of building an eertree 
are gathered in the following table. 


Method 

Time for n calls 

Time for one call 

Space for one node 

basic 

quickLink 

directLink 

0(n log cr) 
0(n log cr) 
0(n log cr) 

J7(logcr) but 0(n) 
f7(log cr) but O(logn) 
O(logcr) 

0(1) 

0(1) 

O(min(log cr, log log n)) 


The basic version is the simplest one and uses the smallest amount of memory. 
Quick and direct links work somewhat faster, but their main advantage is that 
any single call is cheap, and thus can be reversed without much pain. Hence, 
one can easily maintain an eertree for a string with both operations add(c) and 
pop(). Indeed, let add(c) push to a stack the node containing P = maxSuf(S'c) 
and, if P is a new palindrome, the node containing Q such that P = cQc. This 
takes 0(1) additional time and space. Then pop() reads this information from 
the stack and restores the previous state of the eertree in constant time. 

The table above also suggests the question whether some further optimization 
of the obtained algorithms is possible. 







Question 1. Is there an online algorithm which builds an eertree spending 
O(logcr) time and 0(1) space for any call to add? 

3.3 Enumerating rich strings 

By Lemma the number of distinct subpalindromes in a length n string is 
at most n. Such strings with exactly n palindromes are called rich. Rich strings 
possess a number of interesting properties; see, e.g., EE]- The sequence A216264 
in the Online Encyclopedia of Integer Sequences [18] is the growth function of 
the language of binary rich strings, i.e., the nth term of this sequence is the 
number of binary rich strings of length n. J. Shallit computed this function up 
to n = 25, thus enumerating several millions of rich strings. Using the results of 
Sect. |3.2[ we were able to raise the upper bound to n = 60, enumerating several 
trillions of rich strings in 10 hours on an average laptop. The new numerical 
data shows that this sequence grows much slower than it was expected before. 

Proposition below serves as the theoretic basis for such a breakthrough in 
computation. It is based on the following obvious corollary of Lemma 

Lemma 5. Any prefix of a rich string is rich. 

Proposition 7. Suppose that R is the number of k-ary rich strings of length 
< n, for some fixed k and n. Then the trie built from all these strings can be 
traversed in time 0{R). 

Proof. For simplicity, we give the proof for the binary alphabet. The extension 
to an arbitrary fixed alphabet is straightforward. Consider the following code, 
using an eertree on a string with deletions. 


void calcRichString(i) 
ans [i] -I-+ 
if (i < n) 

if (addC’OO ) 

calcRichStringCi + 1) 

popO 

if (addC’lO ) 

calcRichStringCi + 1) 

popO 


Here i is the length of the currently processed rich string. Recall that add(c) 
appends c to the current eertree and returns the number of new palindromes, 
which is 0 or 1. Hence the modified string is rich if and only if add returns 1. 
Note that any added symbol will be deleted back with pop(). So we exit every 
particular call to calcRichString with the same string as the one we entered this 
call. As a result, the call calcRichString(O) traverses depth-first the trie of all 
binary rich strings of length < n. 

As was mentioned in Sect. |3.2^ the pop operation works in constant time. For 
add we use the method with direct links. Since the alphabet is constant-size, the 




array directLink[z;] can be copied in 0(1) time. Hence, add also works in 0(1) 
time. The number of pop’s equals the number of add’s, and the latter is twice the 
number of rich strings of length < n. The number of other operations is constant 
per call of calcRichString, so we have the total 0{R) time bound. □ 

Remark 4- Visit http: //pastebin. com/4YJxVzep for an implementation of the 
above algorithm. In 10 hours, it computed the first 58 terms of the sequence 
A216264. To increase the number of terms to 60, we used a few optimization 
tricks which reduce the constant in the O-term. We do not discuss these tricks 
here, because they make the code less readable. 


3.4 Persistent eertrees 

In Sect. |3.2| we build an eertree supporting deletions from a string. A natural 
generalization of this approach leads to persistent eertrees. Recall that a persis¬ 
tent data structure is a set of “usual” data structures of the same type, called 
versions and ordered by the time of their creation. A call to a persistent struc¬ 
ture asks for the access or update of any specific version. Existing versions are 
neither modified nor deleted; any update creates a new (latest) version. 

Consider a tree of versions T whose nodes, apart from the root, are labeled 
by symbols. The tree represents the set of versions of some string S: each node 
V represents the string read from the root to v. Recall that we denote a node 
of a data structure by the same letter as the string related to it. Note that 
some versions can be identical except for the time of their creation (i.e., for the 
number of a node). The problem we study is maintaining an eertree for each 
version of S. More precisely, the function addVersion(u, c) to be implemented 
adds a new child u labeled by c to the node v of T and computes eertree(u). The 
data structure which performs the calls to add Version, supporting the eertrees 
for all nodes of T, will be called a persistent eertree. Surprisingly enough, this 
complicated structure can be implemented efficiently in spite of the fact that the 
current string cannot be addressed directly for symbol comparisons. 


Proposition 8. The persistent eertree can he implemented to perform each call 
to addVersion(r;, c) in 0(log|'(;|) time and space. 


Proof. We use the method with direct links and build, as in Sect. 3.1 a joint 


eertree for all versions. Each node of the tree T stores links to the palindromes 
of the corresponding version of S. Overall, the node u of T contains the following 
information: a binary search tree searchTree[n], containing links to all subpalin¬ 
dromes of u; link maxSuf[u] to the maximal suffix-palindrome of v] array pred[z;], 
whose zth element is the link to the predecessor z of v such that the distance 
between z and n in T is 2® (i > 0); and the symbol symb[u] added to the parent 
of V to get V. All listed parameters except for searchTree[u] use 0(log |?;|) space. 
For search trees we use, as in Sect. |3.2[ the persistent tree [T], reducing both time 
and space for copying the tree and inserting one element to 0(log|u|). (Recall 
that another persistent tree is used inside the eertree for storing direct links of 
all nodes.) 



Now we implement addVersion(w, c) in time 0(log It'D- Note that for any i the 
symbol can be found in 0(log |v|) time. Indeed, this symbol is symb[z], where 
2 is the predecessor of v such that the distance between 2 : and u is /i = |v| — i. 
Using the binary representation of /i, we can reach 2 ; from v in at most log |v| 
steps following the appropriate pred links. 

Let V be the current number of versions (at any time). Creating a new 
version u with the parent v, we increment V by one and compute all parameters 
for u. First we compute pred)^]. This can be done in C)(log|u|) time because 
pred['u][0] = v and pred[M][i] = pred [pred [m] [z — l]][z — 1] for z > 0. 

To compute the palindrome y = maxSuf[zz], we call add(c) for the string 
V. Let X be the parent of y in the eertree. Then x = maxSuf[z;] if maxSuf[v] is 
preceded by c in z; and x = directLink[maxSuf[z;]][c] otherwise. Hence, to compute 
y we access exactly one symbol of v. Further, if y is not in the eertree, a new 
node of the eertree should be created for y. It is easy to see that link[z/] = 
to[directLink[a;][c]][c]. Next, directLink[zz] is copied from directLink[link[zz]], with 
one element replaced by link[zz]. To find this element, we need to know the letter 
of V preceding x. Therefore, to find maxSuf[zz] and modify eertree if necessary, 
we need 0(log|z;|) time for accessing a constant number of symbols in v and 
0(log(T) time for the rest of computation in add(c). Finally, we create a version 
of the search tree for u, updating the version for v with y (if y is in the search 
tree for v, this tree is copied to the new version without changes). This operation 
takes 0(log juj) as well. The proposition is proved. The code for addVersion(v, c) 
is given below. □ 


void getpredCv, par) 

pred [v] [0] = par 
i = 1 

while (predfv] [i] > 0) 

pred[v] [i + 1] = pred[ pred[v] [i] ] [i] 
i++ 

int addVersionCv, c) 

t++ // the number of versions, initialized by 0 
u = t 

symb [u] = c 

predfu] = getpredCu, v) 

if (c == v[len[v] - lenfmaxSuf [v] ] ] ) 

X = maxSuf[v] 

else 

X = directLinkfmaxSuf[v] ] [c] 
maxSuf[u] = to[x][c] //created if does not exist 
searchTree[u] = insert(searchTree[v], maxSuf[u]) 
return u 



4 Factorizations into Palindromes 

As was mentioned in the introduction, the fc-factorization problem can be solved 
online in 0{kn) time for the length n string and any k |12j . In this section we are 
aimed at solving this problem in time independent of k. This setting is motivated 
by the fact that the expected palindromic length of a random string is 17 (n) |16j . 
and the 0{kn) asymptotics is quite bad for such big values of k. On the positive 
side, the palindromic length of a string S, which is the minimum k such that a 
fc-factorization of S exists, can be found in 0{nlogn) time [S]- 

4.1 Palindromic length vs fc-factorization 

Lemma 6. Given a k-factorization of a length n string S, it is possible, in 
0(n) time, to factor S into k+2t palindromes for any positive integer t such 
that k+2t < n. 

Proof. Let Pi, ... ,Pk be palindromes, S = Pi ■ ■ ■ Pk, k < n — 2. It is sufficient 
to show how to factor S into k+2 palindromes. If |Li| > 3 for some i, then we 
split Pi into three palindromes: the first letter, the last letter, and the remaining 
part. Otherwise, there are some Pi, Pj of length 2, each of which can be split 
into two palindromes. □ 

Thus, fc-factorization problem is reduced in linear time to two similar prob¬ 
lems: factor a string into the minimum possible odd (resp. even) number of 
palindromes. We solve these two problems using an eertree. To do this, we first 
describe an algorithm, based on an eertree and finding the palindromic length in 
time O(nlogn). While its asymptotics is the same as of the algorithm of [3], its 
constant under the O-term is much smaller (see Remark below) and its code 
is simpler and shorter. 

Proposition 9. Using an eertree, the palindromic length of a length n string 
can be found online in time 0(n log n). 

Proof. For a length n string S we compute online the array ans such that ans[i] 
is the palindromic length of S'[l..f]. Note that any fc-factorization of 5'[l..i] can 
be obtained by appending a suffix-palindrome S'[j-|-l..i] of 5'[l..z] to a (fc—1)- 
factorization of S'[l..j]. Thus, ans[j] = l-|-min{ans[j] | 5'[j-|-l..f] is a palindrome}. 

To compute ans efficiently, we store two additional parameters in the nodes 
of the eertree: difference difF[t!] = len[u] — len[link[u]] and series link seriesLink)?;], 
which is the longest suffix-palindrome of v having the difference unequal to 
diff[ri]. Series links are similar to quick links, which are not suitable for the 
problem studied. Clearly, the difference is computable in 0(1) time and space 
on the creation of a node; the following code shows that the same is true for the 
series link. 


if (diff [v] == dif f [link [v] ] ) 

seriesLinkfv] = seriesLink[link[v]] 



else 


seriesLink[v] = link[v] 


The following “naive” implementation computes ans[n] in 0{n) time, 
ans [n] = oo 

for (v = maxSuf; len[v] > 0; v = link[v]) 

ans [n] = min(ans [n] , cins [n - len[v]] + 1) 


With series links, the same idea can be rewritten as follows: 


int getMin(u) 

res = (X) 

for (v = u; len[v] > len[seriesLink[u]]; v = link[v]) 
res = minCres, ans[n - len[v]] + 1) 
return res 
ans [n] = oo 

for (v = maxSuf; len[v] > 0; v = seriesLink[v]) 
ans[n] = minCans[n], getMin(v)) 


The get Min function has linear-time worst-case complexity, and we are going 
to speed it up to a constant time. By the series of a palindrome u we mean 
the sequence of nodes in the suffix path of u from u (inclusive) to seriesLink)^] 
(exclusive). Note that getMin[u] loops through the series of u. Comparing diff)^] 
and diff[link[u]], we can check whether the series of u contains just one palin¬ 
drome. If this is the case, then res = ans[n — len[u]] -I- 1 can be computed in 0(1) 
time. Hence, below we are interested in series of at least two elements. A suffix- 
palindrome M of A is called leading if either u = maxSuf(S') or it = seriesLink[ii] 
for some suffix-palindrome v of S. We need four auxiliary lemmas. 

Lemma 7. If a palindrome v of length I > n/2 is both a prefix and a suffix of a 
string S']!..!!], then S is a palindrome. 

Proof. Let i < n/2. Then S[i] = i;[i] = v[l — i + 1] = S[n — i -|- 1], i.e., S is a 
palindrome by definition. □ 

Lemma 8. Suppose v is a leading suffix-palindrome of a string S[l..n] and u = 
link[i;] belongs to the series of v. Then u occurs in v exactly two times: as a suffix 
and as a prefix. 

Proof. Let i = n — \v\ -\-l. Then v = S[i..n],u = S[i-|-difF[i']..n] = S[i..n—diff]!:]]. 
Since difF[it] = diFF[i;], we have diFF[i;] < |u|/2, so that the two mentioned oc¬ 
currences of u touch or overlap. If there exist k,t such that i < k < i-|-difF[i;] 
and S[k..t] = it, then S[k..n] is a palindrome by Lemmaj^ This palindrome is a 
proper suffix of v and is longer than link[ii], which is impossible. □ 

Lemma 9. Suppose v is a leading suffix-palindrome of a string S[l..n] and 
u = link[i;] belongs to the series of v. Then u is a leading suffix-palindrome 
of S[l..n— difF[i;]]. 







Proof. If u is not leading, then the string S'[l..n— difF[t!]] has a suffix-palindrome 
z = difF[r;]] with link[z] = u and difF[z] = difF[u]. Since u is both a prefix 

and a suffix of z and |z| = |z;| < 2 |m|, clearly z = v. Then w = S[j..n] is a 
palindrome by Lemma Assume that w has a sufHx-palindrome v' which is 
longer than v. Then v' begins with u, and this occurrence of u is neither prefix 
nor suffix of z = difF[u]], contradicting LemmaTherefore, v = link[r(;] 

and diff[ri;] = diff[r;], which is impossible because v is leading. This contradiction 
proves that u is leading. □ 


Lemma 10. In an eertree, a path consisting of series links has length O(logn). 


Proof. Follows from [m Lemma 6], since any leading sufhx-palindrome is also 
leading in terms of |12) . 


By Lemma 10 the function ans(n) calls getMin O(logn) times. Now consider 
an 0(1) time implementation of getMin. Recall that it is enough to analyze 
non-trivial series of palindromes; they look like in Fig. The first positions of 
all palindromes in the depicted series of v and link[z;] match (because diff[i;] = 
diff[link[u]]) except for the last palindrome in the series of v. 


S'[l..n] I 

ans[l..n] I ■ ■ ■ 

S'[l..n— difF[a]] | 


link[ii] 


Fig. 2. Series of a palindrome v in 5'[l..n] and of link[t)] in 5[l..n— difF[t)]]. Leading 
palindromes of the next series are shown by dash lines. The function getMin)?;) returns 
the minimum of the values of ans in the marked positions, plus one. 


We see that difF[?;] steps before we already computed the minimum of all but 
one required numbers. If we memorize the minimum at that moment, we can 
use it now to obtain getMin in constant time. We store such a minimum as an 
additional parameter dp of the node of the eertree, updating it each time the 
palindrome represented by a node becomes a leading sufhx-palindrome. Lem¬ 
mas and ensure that when we access dp[link[?;]] to compute getMin)?;], it is 
exactly the value computed difF]?;] steps before. The computations with dp can 
be performed inside the getMin function: 


int getMin(v) 

dp[v] = ans[n - (len[seriesLink[v]] + diff[v])] //last 
if (diff[v] == diff[link[v]])// non-trivial series 
dp [v] = min(dp[v], dp[link[v]]) 
return dp[v] + 1 




















Here clp[z;] is initialized by the value of a ns in the position preceding the last 
element of the series of v. It is nothing to do if this series does not have other 
elements; if it has, the minimum value of a ns in the corresponding positions is 
available in dp[link[n]]. 


Remark 5. Series links can replace quick links in the construction of eertrees. 
Recall that in the method of quick links (Sect. 3.2) after checking the symbols 
in S preceding v and link[n] we assign quickLink['(;] to v and repeat the process 
until the required symbol is found or the node —1 is reached. With series links, 
the termination condition is the same, but the process is a bit different. We first 
put V = maxSuf(S') and check the symbol before v. Then we keep repeating two 
operations: check the symbol preceding link[n] and assign seriesLink[n] to v. In 
this way, all “skipped” symbols, including the symbol preceding v, equal the 
symbol preceding the previous value of link[ti]. (This is due to periodicity of v; 
for details see, e.g., m Sect. 2].) The number of iterations of the cycle equals 


the number of series of sufRx-palindromes of S, which is O(logn) by Lemma 10 


Remark 6. Let ti be the number of series of suffix-palindromes for the string 
S'[l..i]. Our computation of palindromic lengtI0 performs, on each step, the fol¬ 
lowing operations. For the eertree: at most ti+1 symbol comparisons (Remark]^ 
and one (logcr)-time access to a dictionary. For palindromic length: ti calls to 
getMin, which fills one cell in dp and one cell in ans. 

The algorithm by Fici et al. [21 Figure 8] on each step builds three arrays 
(G, G',G"), each containing ti triples of numbers; totally cells to be filled. 
So, our algorithm should work significantly faster. 


Now we return to the fc-factorization problem. 


Proposition 10. Using an eertree, the k-factorization problem for a length n 
string can be solved online in time 0{n\ogn). 

Proof. The above algorithm for palindromic length can be easily modified to 
obtain both minimum odd number of palindromes and minimum even number 
of palindromes needed to factor a string. Instead of ans and dp, one can maintain 
in the same way four parameters: anSo, anSg, dp^, dpg, to take parity into account. 
Now anSo (resp., anSe) uses dpg (resp., dp^,), while dp^, (resp., dpg) uses anSo (resp., 
anSe). The reference to Lemmafinishes the proof. 


4.2 Towards a linear-time solution 

A big question is whether palindromic length can be found faster than in 
O(nlogn) time. First of all, it may seem that the bound 0{n\ogn) for our 
algorithm is imprecise. Indeed, for building an eertree we scan only 0(n) suffix 
palindromes even when we use just suffix links (see the proof of Proposition]^. 
For palindromic length, on each step we run through all sufHx-palindromes, but 

^ See http://ideone.com/xE2k6Y for an implementation. 





possibly skipping many of them due to the use of series links. Can this number 
of scanned palindromes be 0{n) as well? As was observed in [3], the answer is 
“yes” on average, but “no” in the worst case: processing any length n prefix of 
the famous Zimin word, one should analyze 0(nlogn) series of palindromes (all 
of them 1-element, but this does not help). 

Below we design an 0(n) offline algorithm for building an eertree of a length 
n string S over the alphabet {1,... ,n}, getting rid of the logcr factor in online 
algorithms. Then we discuss ideas which may help to obtain the palindromic 
length from an eertree in linear time. The offline algorithm consists of four steps. 

1. Using Manacher’s algorithm, compute arrays oddR and evenR, where oddR[i] 
(resp. evenR[f]) is the radius of the longest subpalindrome of S with the center 
i (resp., i-l-1/2). 

2. Compute the longest and the second longest sufRx-palindromes for any prefix 
of S. We use variables and r such that after rth iteration the string S[£..r\ 
(resp., S[£’..r\) is the longest (resp., second longest) sufhx-palindrome of 5'[l..r]. 


for (r = 1; r < n; r++) 

£— 

while ( ! isPalCS [£. .r] ) 

£++ 

£' = max(f' - 1 , £ + 1) 
while ( !isPal(S[f. .r] ) && (f < r) ) 
f++ 

C\_{£ + r) / 2].push(l, r) 

Cl(£' + r) / 2] .push(2, r) 


The function is Pa I, checking whether a given substring is a palindrome, works 
in 0(1) time, using the value obtained on step 1 for the center (£-|-r)/2. Each 
element of the array O is a connected list; the indices are both integers and half¬ 
integers. The internal cycles make at most 2n increments of each of the variables 
£, £'] hence, the whole step works in linear time. 

3. Build the suffix array 5'A and the LCP array for S\ for the alphabet {1,..., n}, 
this can be done in linear time m- Recall that LCP[i] is the length of the longest 
common prefix of 5'[5'A[i]..n] and S'[S'A[f—l]..n]. 

4. Recall from Sect. |2.3| that an eertree consists of two tries, containing right 
halves of odd-length and even-length palindromes, respectively. Build each of 
them using a variation of the algorithm, constructing a suffix tree from a suffix 
array and its LCP array jS]. The algorithm for odd-length palindromes is given 
below; the algorithm for even lengths is essentially the same, so we omit it. 


path = (-1) // stack for the current branch of the trie 
for (i = 1; i < n; i++) 

k = SA[i] // start processing palindromes centered at k 
while (path, size 0 > LCP[i] + 1) 





path.pop 0 

for (j = path.sizeO; j < oddR[k] ; j++) //can be empty 
path.push( newNode(path.top0, S[k + j - 1]) ) 
for (j =1; j < C[k].size(); j++) 

(rank, r) = C [k] [j] 

node [rcink] [r] = path [r - k + 1] 


The function newNode('(;, a) returns a new node attached to the node v with 
the edge labeled by a. Array node[l][l..n] (resp., node[2][l..n]) contains links to 
the longest (resp., second longest) palindromes ending in given positions. Now 
estimate the working time of this algorithm. The outer cycle works 0{n) time 
plus the time for the inner cycles. The number of pop operations is bounded by 
the number of pushes, and the latter is the same as the number of nodes in the 
resulting eertree, which is 0(n). The total number of iterations of the third inner 
cycle is the number of palindromes stored in the whole array C; this is exactly 
2n, see step 2. Thus, the algorithm works in 0(n) time. 

After running both the above code and its modification for even-length palin¬ 
dromes, we obtain the eertree without suffix links and the arrays node[l], node[2]. 
From these arrays the suffix links can be computed trivially: 

for (i = 1; i < n; i++) 

link[ nodefl] [i] ] = node[2] [i]; 


Thus we have proved 

Proposition 11. The eertree of a length n string over the alphabet {1, ... ,n} 
can be built offline in 0(n) time. 

Now return to the palindromic length. Even with an 0{n) preprocessing for 
building the eertree, we still need 0(n log n) time for factorization. Note that in 
[12] an 0{kn\ogn) algorithm for ^-factorization was transformed into a Oikn) 
algorithm using bit compression (the so-called method of four Russians). That 
algorithm produced a, k x n bit matrix (showing whether a jth prefix of the 
string is i-factorable), so such a speed up method was natural. In our case we 
work with integers, so the direct application of a bit compression is impossible. 
However, we have the following property. 

Lemma 11. If S is a string of palindromic length k and c is a symbol, then the 
palindromic length of Sc is k—l,k, or k-\-\. 

Proof. Any fc-factorization of S plus the substring c give a (/c-|-l)-factorization 
of Sc. Suppose Sc has a t-factorization Pi ■ ■ ■ Pt for a smaller t. Then Pt = Pc 
has length > 1. Hence, either P = c and S has the t-factorization Pi • • • Pt-ic 
or P = cQ for a palindrome Q and S has the (t-l-l)-factorization Pi • • • Pt-icQ. 
The result now follows. □ 

Consider a n x n bit matrix M such that M[i,j] = 1 if and only if S'[l..j] is i- 
factorable. For jth column, we have to compute just two values: in the rows k—\ 





and k, where k is the palindromic length of 1] (if M[k—l,j] = M[k,j] = 0, 

we write M[k+l,j] = 1 by Lemma 11). For each value we should apply the OR 
operation to logn bit values, to the total of 2nlogn bit operations. If we will 
be able to arrange these operations naturally in groups of size logn, we will use 
the bit compression to get just 0{n) operations. So we end the paper with the 
following conjecture. 


Conjecture 1. Using Lemma [m eertree and the method of four Russians it is 
possible to find palindromic length of a string in 0(n log u) time online and in 
0(n) time offline. 


5 Conclusion 

In this paper, we proposed a new tree-like data structure, named eertree, which 
stores all palindromes occurring inside a given string. The eertree has linear 
size (even sublinear on average) and can be built online in nearly linear time. 
We proposed some advanced modifications of the eertree, including the joint 
eertree for several strings, the version supporting deletions from a string, and 
the persistent eertree. 

Then we provided a number of applications of the eertree. The most im¬ 
portant of them are the new online algorithms for fc-factorization, palindromic 
length, the number of distinct palindromes, and also for computing the number 
of rich strings up to a given length. 

For further research we formulated a conjecture on the linear-time factoriza¬ 
tion into palindromes and an open problem about the optimal construction of 
the eertree. 
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